Ranking-Constrained Keyword Sequence Extraction from Web Documents

نویسندگان

Ding-Yi Chen

Xue Li

Jing Liu

Xia Chen

چکیده

Given a large volume of Web documents, we consider problem of finding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a given search engine, then the corresponding Web document can be identified and is ranked at the first place within the results. We call this system as an Inverse Search Engine (ISE). Whenever a shortest keyword sequence is found for a given Web document, the corresponding document can be returned as the first document by the given search engine. The resulting keyword sequence is search-engine dependent. The ISE therefore can be used as a tool to manage Web content in terms of the extracted shortest keyword sequences. In this way, a traditional keyword extraction process is constrained by the document ranking method adopted by a search engine. The significance is that the whole Web-searchable documents on the World Wide Web can then be partitioned according to their keyword phrases. This paper discusses the design and implementation of the proposed ISE. Four evaluation measures are proposed and are used to show the effectiveness and efficiency of our approach. The experiment results set up a test benchmark for further researches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Network-based Keyword Extraction from Multitopic Web Documents

In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We ...

متن کامل

Toward Network-based Keyword Extraction from Multitopic Web Documents

متن کامل

Graph-Based Keyword Extraction for Single-Document Summarization

In this paper, we introduce and compare between two novel approaches, supervised and unsupervised, for identifying the keywords to be used in extractive summarization of text documents. Both our approaches are based on the graph-based syntactic representation of text and web documents, which enhances the traditional vector-space model by taking into account some structural document features. In...

متن کامل

Relevant Pages in semantic Web Search Engines using Ontology

In general, search engines are the most popular means of searching any kind of information from the Internet. Generally, keywords are given to the search engine and the Web database returns the documents containing specified keywords. In many situations, irrelevant results are given as results to the user query since different keywords are used in different forms in various documents. The devel...

متن کامل

Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines

With the tremendous growth of World Wide Web, it has become necessary to organize the information in such a way that it will make easier for the end users to find the information they want efficiently and accurately. This requires a pre-ranking of the underlying similar documents after the formation of the index. Thereafter the ranking of the search results in response to a query takes place wh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Ranking-Constrained Keyword Sequence Extraction from Web Documents

نویسندگان

چکیده

منابع مشابه

Toward Network-based Keyword Extraction from Multitopic Web Documents

Toward Network-based Keyword Extraction from Multitopic Web Documents

Graph-Based Keyword Extraction for Single-Document Summarization

Relevant Pages in semantic Web Search Engines using Ontology

Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines

عنوان ژورنال:

اشتراک گذاری